Skip to content

Fix performance of DPF vector #2249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 6, 2025
Merged

Fix performance of DPF vector #2249

merged 6 commits into from
May 6, 2025

Conversation

cbellot000
Copy link
Contributor

@cbellot000 cbellot000 commented Apr 30, 2025

closing #2201

DPF vector became slow because they were always checking if the modification of the vector by

  1. making a deep copy of the vector
  2. comparing the whole vector

Why performance changed between 0.13.6 and master?
Before 0.13.7.dev, it was hard coded that in process dpf vector wouldn't "check modification" and though bypassed dpf server logic. 0.13.7.dev is now relying on DPF server logic for updating or not dpf vector.

Proposed fixes

  • DPFVector are now always "committed" to dpf server whether they have been modified or not (as checking for modification is very slow)
  • Performance is highly improved for server version checks by caching previous checks

Benchmark
Test is taken for #2201, changing params are:

  • pydpf-core version
  • server types: in process/gRPC
  • vector operations: data is only "gotten", or data is "gotten" and "set"
Server type Vec operation 0.13.6 0.13.7.dev
In Process Get 0.009 s 0.008 s
In Process Get/Set 0.012 s 0.011 s
gRPC Get 6.608 s 3.773 s
gRPC Get/Set 8.707 s 3.707 s

@cbellot000 cbellot000 requested a review from a team as a code owner April 30, 2025 09:30
@cbellot000 cbellot000 linked an issue Apr 30, 2025 that may be closed by this pull request
3 tasks
Copy link

codecov bot commented Apr 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.82%. Comparing base (facbfc2) to head (82e504a).
Report is 1 commits behind head on master.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2249      +/-   ##
==========================================
+ Coverage   83.75%   83.82%   +0.07%     
==========================================
  Files          90       90              
  Lines       10388    10396       +8     
==========================================
+ Hits         8700     8714      +14     
+ Misses       1688     1682       -6     


# The updated version of the DPF vector will always be committed to DPF.
# Ideally, this should be set to True only when modified, however this is not possible to do that efficiently.
# Consequently, for performance reasons, it's much better to always commit the vector to DPF rather than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbellot000 is this also true in the case of a huge vector in gRPC on a remote machine with a high ping?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that performance improvements reported are for gRPC communications with a local server, is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it's twice faster, I think that it should be safe to assume that it will as fast

@cbellot000 cbellot000 merged commit 677b743 into master May 6, 2025
41 of 42 checks passed
@cbellot000 cbellot000 deleted the fix/perf_issue_dpf_vector branch May 6, 2025 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance of get/set field data
3 participants